智能论文笔记

MONAI: An open-source framework for deep learning in healthcare

M. Jorge Cardoso , Wenqi Li , Richard Brown , Nic Ma , Eric Kerfoot , Yiheng Wang , Benjamin Murrey , Andriy Myronenko , Can Zhao , Dong Yang

分类：机器学习 | 人工智能 | 计算机视觉

2022-11-04

Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.

translated by 谷歌翻译

SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias

Zihao Wang , Kejun Zhang , Yuxing Wang , Chen Zhang , Qihao Liang , Pengfei Yu , Yongsheng Feng , Wenbo Liu , Yikai Wang , Yuntai Bao

分类：机器学习

2022-09-13

实时音乐伴奏的生成在音乐行业（例如音乐教育和现场表演）中具有广泛的应用。但是，自动实时音乐伴奏的产生仍在研究中，并且经常在逻辑延迟和暴露偏见之间取决于权衡。在本文中，我们提出了Song Driver，这是一种无逻辑延迟或暴露偏见的实时音乐伴奏系统。具体而言，Songdriver将一个伴奏的生成任务分为两个阶段：1）安排阶段，其中变压器模型首先安排了和弦，以实时进行输入旋律，并在下一阶段加速了和弦，而不是播放它们。 2）预测阶段，其中CRF模型基于先前缓存的和弦生成了即将到来的旋律的可播放的多轨伴奏。通过这种两相策略，歌手直接生成即将到来的旋律的伴奏，从而达到了零逻辑延迟。此外，在预测时间步的和弦时，歌手是指第一阶段的缓存和弦，而不是其先前的预测，这避免了暴露偏见问题。由于输入长度通常在实时条件下受到限制，因此另一个潜在的问题是长期顺序信息的丢失。为了弥补这一缺点，我们在当前时间步骤作为全球信息之前从长期音乐作品中提取了四个音乐功能。在实验中，我们在一些开源数据集上训练歌手，以及由中国风格的现代流行音乐得分构建的原始\```````'''aisong数据集。结果表明，歌手在客观和主观指标上均优于现有的SOTA（最先进）模型，同时大大降低了物理潜伏期。

translated by 谷歌翻译

Entropy Induced Pruning Framework for Convolutional Neural Networks

Yiheng Lu , Ziyu Guan , Yaming Yang , Maoguo Gong , Wei Zhao , Kaiyuan Feng

分类：计算机视觉

2022-08-13

结构化的修剪技术在用于图像分类任务的卷积神经网络上取得了出色的压缩性能。但是，大多数现有方法都是面向重量的，当原始模型的训练不佳时，它们的修剪结果可能不令人满意。也就是说，需要一个全面训练的模型来提供有用的权重信息。这可能是耗时的，并且修剪结果对模型参数的更新过程敏感。在本文中，我们提出了一个名为“平均过滤器信息熵（AFIE）”的度量，以测量每个滤镜的重要性。它是由三个主要步骤计算得出的，即每个卷积层的“输入输出”矩阵的低排放分解，所获得的特征值的归一化以及基于信息熵的滤波器重要性计算。通过利用拟议的AFIE，无论是否完全训练原始模型，建议的框架都能对每个过滤器进行稳定的重要性评估。我们基于Alexnet，VGG-16和Resnet-50实施AFIE，并分别对MNIST，CIFAR-10和Imagenet进行测试。实验结果令人鼓舞。我们出乎意料地观察到，对于我们的方法，即使原始模型仅经过一个时代的训练，每个过滤器的重要性评估在模型经过全面训练时都与结果相同。这表明拟议的修剪策略可以在原始模型的训练过程的开始阶段有效地执行。

translated by 谷歌翻译

SBPF: Sensitiveness Based Pruning Framework For Convolutional Neural Network On Image Classification

Yiheng Lu , Maoguo Gong , Wei Zhao , Kaiyuan Feng , Hao Li

分类：计算机视觉 | 人工智能

2022-08-09

修剪技术可全面使用图像分类压缩卷积神经网络（CNN）。但是，大多数修剪方法需要一个经过良好训练的模型，以提供有用的支持参数，例如C1-核心，批处理值和梯度信息，如果预训练的模型的参数为，这可能会导致过滤器评估的不一致性不太优化。因此，我们提出了一种基于敏感性的方法，可以通过为原始模型增加额外的损害来评估每一层的重要性。由于准确性的性能取决于参数在所有层而不是单个参数中的分布，因此基于灵敏度的方法将对参数的更新具有鲁棒性。也就是说，我们可以获得对不完美训练和完全训练的模型之间每个卷积层的相似重要性评估。对于CIFAR-10上的VGG-16，即使原始模型仅接受50个时期训练，我们也可以对层的重要性进行相同的评估，并在对模型进行充分训练时的结果。然后，我们将通过量化的灵敏度从每一层中删除过滤器。我们基于敏感性的修剪框架在VGG-16，分别具有CIFAR-10，MNIST和CIFAR-100的VGG-16上有效验证。

translated by 谷歌翻译

A Cooperative Perception Environment for Traffic Operations and Control

Hanlin Chen , Brian Liu , Xumiao Zhang , Feng Qian , Z. Morley Mao , Yiheng Feng

分类：机器人

2022-08-04

用于流量操作和控制的现有数据收集方法通常依赖于基于基础架构的环路探测器或探测器车辆轨迹。连接和自动化的车辆（CAVS）不仅可以报告有关自己的数据，而且可以提供所有检测到的周围车辆的状态。从多个CAVS以及基础设施传感器（例如Lidar）的感知数据集成，即使在非常低的渗透率下也可以提供更丰富的信息。本文旨在开发合作数据收集系统，该系统集成了来自基础架构和CAVS的LiDar Point Cloud数据，以为各种运输应用创建合作感知环境。最新的3D检测模型用于在合并点云中检测车辆。我们在与Carla和Sumo的共模拟平台中测试了具有最大压力自适应信号控制模型的提出的合作感知环境。结果表明，CAV和基础设施传感器的渗透率非常低，足以实现可比性的性能，而连接车辆（CV）的渗透率为30％或更高。我们还显示了不同CAV渗透率下的等效CV渗透率（E-CVPR），以证明合作感知环境的数据收集效率。

translated by 谷歌翻译

Effective and Efficient Training for Sequential Recommendation Using Cumulative Cross-Entropy Loss

Fangyu Li , Shenbao Yu , Feng Zeng , Fang Yang

分类：机器学习

2023-01-03

Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.

translated by 谷歌翻译

Generalizable Black-Box Adversarial Attack with Meta Learning

Fei Yin , Yong Zhang , Baoyuan Wu , Yan Feng , Jingyi Zhang , Yanbo Fan , Yujiu Yang

分类：机器学习 | 计算机视觉

2023-01-01

In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.

translated by 谷歌翻译

Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction

Jie Feng , Ruimin Feng , Qing Wu , Zhiyong Zhang , Yuyao Zhang , Hongjiang Wei

分类：计算机视觉

2022-12-31

Supervised Deep-Learning (DL)-based reconstruction algorithms have shown state-of-the-art results for highly-undersampled dynamic Magnetic Resonance Imaging (MRI) reconstruction. However, the requirement of excessive high-quality ground-truth data hinders their applications due to the generalization problem. Recently, Implicit Neural Representation (INR) has appeared as a powerful DL-based tool for solving the inverse problem by characterizing the attributes of a signal as a continuous function of corresponding coordinates in an unsupervised manner. In this work, we proposed an INR-based method to improve dynamic MRI reconstruction from highly undersampled k-space data, which only takes spatiotemporal coordinates as inputs. Specifically, the proposed INR represents the dynamic MRI images as an implicit function and encodes them into neural networks. The weights of the network are learned from sparsely-acquired (k, t)-space data itself only, without external training datasets or prior images. Benefiting from the strong implicit continuity regularization of INR together with explicit regularization for low-rankness and sparsity, our proposed method outperforms the compared scan-specific methods at various acceleration factors. E.g., experiments on retrospective cardiac cine datasets show an improvement of 5.5 ~ 7.1 dB in PSNR for extremely high accelerations (up to 41.6-fold). The high-quality and inner continuity of the images provided by INR has great potential to further improve the spatiotemporal resolution of dynamic MRI, without the need of any training data.

translated by 谷歌翻译

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Yukun Feng , Ming Tu , Rui Xia , Chuanzeng Huang , Yuxuan Wang

分类：自然语言处理

2022-12-30

Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.

translated by 谷歌翻译

Learning Implicit Functions for Dense 3D Shape Correspondence of Generic Objects

Feng Liu , Xiaoming Liu

分类：计算机视觉

2022-12-29

The objective of this paper is to learn dense 3D shape correspondence for topology-varying generic objects in an unsupervised manner. Conventional implicit functions estimate the occupancy of a 3D point given a shape latent code. Instead, our novel implicit function produces a probabilistic embedding to represent each 3D point in a part embedding space. Assuming the corresponding points are similar in the embedding space, we implement dense correspondence through an inverse function mapping from the part embedding vector to a corresponded 3D point. Both functions are jointly learned with several effective and uncertainty-aware loss functions to realize our assumption, together with the encoder generating the shape latent code. During inference, if a user selects an arbitrary point on the source shape, our algorithm can automatically generate a confidence score indicating whether there is a correspondence on the target shape, as well as the corresponding semantic point if there is one. Such a mechanism inherently benefits man-made objects with different part constitutions. The effectiveness of our approach is demonstrated through unsupervised 3D semantic correspondence and shape segmentation.

translated by 谷歌翻译